Writing SHA-256 in Python

SHA-256 has become an incredibly useful cryptographic function. It has the capability to convert any data into a unique 32-byte string (256 bits respectively). The real magic comes from its following two properties.

  1. Any change in the incoming data will produce a completely new and unique 32-byte string (if two pieces of data can produce the same string it is called a collision and can be an enormous security problem)
  2. The original data cannot be reproduced from the hash, therefore the function is one directional. This means the hash can be shared freely without any fear of the original data being

Go ahead and try below. I guarantee you will not be able to obtain the same key for any two strings. Change a single leter to a capital, add a single comma, or swap two values and you'll get a completely different hash.

How (not why, that's worth a post in itself) exactly does the SHA-256 do this? Well, lets make our own SHA-256 function to find out! This post will work through the Python function created which is actually running above.

Preprocessing Data

The first thing to do is to create a function that takes in a string and converts it to binary. In this example we'll limit our data to string values for representation sake. With this we can easily convert teach value to it's unicode code point by using Python's ord() function and convert this integer to binary by using Python 3's format function.


In [1]:
def _str_to_bin(data_string):
    """Returns the binary representation of a string using unicode (or ASCII)
       representation of the string values

    args:
        data_string (str): Incoming string in unicode/ASCII format

    return args:
        binary_data (str): Binary representation of the string
    """
    print("String values:", list(data_string))
    unicode_points = [ord(char) for char in data_string]
    print("Unicode points:", unicode_points)
    binary_values = ['{0:08b}'.format(point) for point in unicode_points]
    print("Binary values:", binary_values)
    binary_data = ''.join(binary_values)
    print('Binary data: ', binary_data)
    return binary_data

binary_data = _str_to_bin('abc')


String values: ['a', 'b', 'c']
Unicode points: [97, 98, 99]
Binary values: ['01100001', '01100010', '01100011']
Binary data:  011000010110001001100011

Now that we have our binary data we need to preprocess it to have it in the correct format for the future SHA-256 functions. This is is achieved with the following three properties.

  1. A 1 is added to the end of the binary data.
  2. The length of the binary data (3 x 8 = 24) is appended to the end of the binary blob in a 64 bit number.
  3. The final 1 and the 64 bit number from 1. and 2. are separated by 0s so that the total length of the binary blob is a multiple of 512.

The purpose of this step is to have 512 bit blocks which will be used for each step of the SHA-256 process. Furthermore, this 512 bit block will be used in 16 smaller 32 bit blocks for each sub-process of each step. Therefore, for the data to be easily used, and in binary format it is also converted back to int 512 bit tuples each with 16 32-bit integers.


In [22]:
def preprocess_data(binary_data):
    print("Binary data:", binary_data)
    data_length = '{0:064b}'.format(len(binary_data))
    print("Length:", len(binary_data), "bits")
    print(len(binary_data), "in 64 bit binary format:", data_length)
    padding = '0' * (512 - (len(binary_data) + 1 + 64) % 512)
    print("Padding needed for a 512 bit block:", len(padding), "0s")
    binary_string = binary_data + '1' + padding + data_length
    print("Final binary string:", binary_string)
    processed_values = [int(binary_string[i:i + 32], 2)
                        for i in range(0, len(binary_string), 32)]
    output_data = [tuple(processed_values[i:i + 16]) 
                   for i in range(0, len(processed_values), 16)]
    print("Final data ints for processing: ", output_data)
    return output_data

preprocessed_data = preprocess_data(binary_data)


Binary data: 011000010110001001100011
Length: 24 bits
24 in 64 bit binary format: 0000000000000000000000000000000000000000000000000000000000011000
Padding needed for a 512 bit block: 423 0s
Final binary string: 01100001011000100110001110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000011000
Final data ints for processing:  [(1633837952, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 24)]

Bitwise Operations Used for SHA-256

So far we haven't touched anything to do with the SHA-256 process. The power of SHA-256 comes from two main factors:

  1. Completely random seed values
  2. A number of bitwise operations (operations which move 1s and 0s around in a specific manor)

The bitwise operations used in SHA-256 are as follows.

Rotate right operation (ROTR)
Rotates the binary data right n units, for example:
ROTR(100010, 1) --> 010001
ROTR(100010, 2) --> 101000

Shift right operation (SHR)
Same as ROTR, however, data on the right side is lost and the left side is filled with 0s
SHR(100010, 1) --> 010001
SHR(100010, 2) --> 001000

Choose operation (Ch)
Takes 3 equal length binary values, x, y, and z. For each binary value the operation checks if x is 1 or 0. If x is 1, the value is chosen from y and if x is 0 the value is chosen from z.

```

| x      | 1010 |
| y      | 0011 |
| z      | 1100 |
| output | 0110 |

```

Majority operation (Maj)
Takes 3 equal length binary values, x, y, and z. For each binary value the operation checks the total numbers of 1s and 0s in x, y and z. If a majority of the values are 1 the output is 1 for that position and if the majority are 0s the output is 0.

```

| x      | 1010 |
| y      | 0011 |
| z      | 1100 |
| output | 1010 |

```

The SHA-256 Seeds

Now that we have our manipulation functions, what are we going to do with them? Well, in the SHA-256 function you begin with a random value which, as recommended by the NSA, are "the first sixty-four bits of the fractional parts of the square roots of the ninth through sixteenth prime numbers". These are as below:

6a09e667 bb67ae85 3c6ef372 a54ff53a 510e527f 9b05688c 1f83d9ab 5be0cd19

Furthermore, the SHA-256 algorithm is interesting in that it continuously feeds in random data in each scrambling iteration (64 iterations for each 512 bit block). The random data that is fed in with each iteration is seen below as well, called K. You can see there are 8x8 seeds which makes the total of 64 seeds to pass in.

428a2f98 71374491 b5c0fbcf e9b5dba5 3956c25b 59f111f1 923f82a4 ab1c5ed5
d807aa98 12835b01 243185be 550c7dc3 72be5d74 80deb1fe 9bdc06a7 c19bf174
e49b69c1 efbe4786 0fc19dc6 240ca1cc 2de92c6f 4a7484aa 5cb0a9dc 76f988da
983e5152 a831c66d b00327c8 bf597fc7 c6e00bf3 d5a79147 06ca6351 14292967
27b70a85 2e1b2138 4d2c6dfc 53380d13 650a7354 766a0abb 81c2c92e 92722c85
a2bfe8a1 a81a664b c24b8b70 c76c51a3 d192e819 d6990624 f40e3585 106aa070
19a4c116 1e376c08 2748774c 34b0bcb5 391c0cb3 4ed8aa4a 5b9cca4f 682e6ff3
748f82ee 78a5636f 84c87814 8cc70208 90befffa a4506ceb bef9a3f7 c67178f2

Just Give Me the Algorithm Already!

Alright, there's been a lot of background to set up for the actual function. Alas, here it is. Starting with the beginner seed from above we split it into 8 sections called a, b, c, d, e, f, g and h. So:

a = 6a09e667
b = bb67ae85
c = 3c6ef372
d = a54ff53a
e = 510e527f
f = 9b05688c
g = 1f83d9ab
h = 5be0cd19

From here on we calculate a new set of a - h. In each step b becomes a, c becomes b, d becomes c, etc. However, the key is that both a and d are added with a random value called W, T1 and T2. W is calculated for the first 16 iterations from the raw data (that we preprocessed earlier). From the 16th to 64th iteration, however, it is calculated as follows.

new_W = sigma_1(W[t-2]) + W[t-7] + sigma_0(W[t-15]) + W[t-16]

T1 and T2 are also calculated as below, where K is each piece of the 64 seeds from aboce:

T1 = h + Epsilon_1(e) + Ch(e, f, g) + K[t] + W[t]
T2 = Epsilon_0(a) + Maj(a, b, c)

Now we can calculate our new a - h values, with Python we can easily assign all these in a single line.

a, b, c, d, e, f, g, h = T1 + T2, a, b, c, d + T1, e, f, g

Repeat this 64 times and you're done! The full algorithm can be seen below.

def sha256(input_message):
    """Performs the SHA-256 algorithm on the incoming string. This is performed
       by converting the string to a binary string via unicode positions on
       which the permutations are performed. The result is then converted to a
       hexidecimal string and returned from the function

    args:
        input_message (str): Incoming string to be converted to SHA-256 digest

    output args:
        sha_256_digest (str):
            The resulting hash from the data_string in hexidecimal format
    """
    binary_data = str_to_bin(input_message)
    M = preprocess_data(binary_data)
    for i in range(len(M)):
        a, b, c, d, e, f, g, h = H[i]

        W = list(M[i])
        for t in range(64):
            if t >= 16:
                new_W = sigma_1(W[t-2]) + W[t-7] + sigma_0(W[t-15]) + W[t-16]
                W.append(hex8(new_W))
            T1 = hex8(h + Epsilon_1(e) + Ch(e, f, g) + K[t] + W[t])
            T2 = hex8(Epsilon_0(a) + Maj(a, b, c))
            e, f, g, h = hex8(d + T1), e, f, g
            a, b, c, d = hex8(T1 + T2), a, b, c

        H.append([
            hex8(a + H[i][0]),
            hex8(b + H[i][1]),
            hex8(c + H[i][2]),
            hex8(d + H[i][3]),
            hex8(e + H[i][4]),
            hex8(f + H[i][5]),
            hex8(g + H[i][6]),
            hex8(h + H[i][7]),
        ])

    return ' '.join(['{:08x}'.format(val).upper() for val in H[-1]])

As you can see the supporting functions are missing. Please check out my Github (there's full installation instructions) if you want the fully tested, and fully functioning function! Furthermore, the field at the top of this blog is actually connected through Amazon Lambda and is the live functioning python code calculating your new SHA-256 digests!